Limit search to available items
Book Cover
E-book
Author Bécue-Bertaut, Mónica

Title Textual Statistics with R
Published Milton : Chapman and Hall/CRC, 2019

Copies

Description 1 online resource (213 pages)
Series Chapman and Hall/CRC Computer Science and Data Analysis Ser
Chapman and Hall/CRC Computer Science and Data Analysis Ser
Contents Cover; Half Title; Title Page; Copyright Page; Table of Contents; Foreword; Preface; 1: Encoding: from a corpus to statistical tables; 1.1 Textual and contextual data; 1.1.1 Textual data; 1.1.2 Contextual data; 1.1.3 Documents and aggregate documents; 1.2 Examples and notation; 1.3 Choosing textual units; 1.3.1 Graphical forms; 1.3.2 Lemmas; 1.3.3 Stems; 1.3.4 Repeated segments; 1.3.5 In practice; 1.4 Preprocessing; 1.4.1 Unique spelling; 1.4.2 Partially automated preprocessing; 1.4.3 Word selection; 1.5 Word and segment indexes; 1.6 The Life_UK corpus: preliminary results
1.6.1 Verbal content through word and repeated segment indexes1.6.2 Univariate description of contextual variables; 1.6.3 A note on the frequency range; 1.7 Implementation with Xplortext; 1.8 Summary; 2: Correspondence analysis of textual data; 2.1 Data and goals; 2.1.1 Correspondence analysis: a tool for linguistic data analysis; 2.1.2 Data: a small example; 2.1.3 Objectives; 2.2 Associations between documents and words; 2.2.1 Profile comparisons; 2.2.2 Independence of documents and words; 2.2.3 The X2 test; 2.2.4 Association rates between documents and words
2.3 Active row and column clouds2.3.1 Row and column profile spaces; 2.3.2 Distributional equivalence and the X2 distance; 2.3.3 Inertia of a cloud; 2.4 Fitting document and word clouds; 2.4.1 Factorial axes; 2.4.2 Visualizing rows and columns; 2.4.2.1 Category representation; 2.4.2.2 Word representation; 2.4.2.3 Transition formulas; 2.4.2.4 Simultaneous representation of rows and columns; 2.5 Interpretation aids; 2.5.1 Eigenvalues and representation quality of the clouds; 2.5.2 Contribution of documents and words to axis inertia; 2.5.3 Representation quality of a point
2.6 Supplementary rows and columns2.6.1 Supplementary tables; 2.6.2 Supplementary frequency rows and columns; 2.6.3 Supplementary quantitative and qualitative variables; 2.7 Validating the visualization; 2.8 Interpretation scheme for textual CA results; 2.9 Implementation with Xplortext; 2.10 Summary of the CA approach; 3: Applications of correspondence analysis; 3.1 Choosing the level of detail for analyses; 3.2 Correspondence analysis on aggregate free text answers; 3.2.1 Data and objectives; 3.2.2 Word selection; 3.2.3 CA on the aggregate table; 3.2.3.1 Document representation
3.2.3.2 Word representation3.2.3.3 Simultaneous interpretation of the plots; 3.2.4 Supplementary elements; 3.2.4.1 Supplementary words; 3.2.4.2 Supplementary repeated segments; 3.2.4.3 Supplementary categories; 3.2.5 Implementation with Xplortext; 3.3 Direct analysis; 3.3.1 Data and objectives; 3.3.2 The main features of direct analysis; 3.3.3 Direct analysis of the culture question; 3.3.4 Implementation with Xplortext; 4: Clustering in textual data science; 4.1 Clustering documents; 4.2 Dissimilarity measures between documents; 4.3 Measuring partition quality
Notes 4.3.1 Document clusters in the factorial space
Print version record
Form Electronic book
ISBN 9781351816366
1351816365